stochastic mirror descent
On the Generalization Error of Stochastic Mirror Descent for Quadratically-Bounded Losses: an Improved Analysis
In this work, we revisit the generalization error of stochastic mirror descent for quadratically bounded losses studied in Telgarsky (2022). Quadratically bounded losses is a broad class of loss functions, capturing both Lipschitz and smooth functions, for both regression and classification problems. We study the high probability generalization for this class of losses on linear predictors in both realizable and non-realizable cases when the data are sampled IID or from a Markov chain. The prior work relies on an intricate coupling argument between the iterates of the original problem and those projected onto a bounded domain. This approach enables blackbox application of concentration inequalities, but also leads to suboptimal guarantees due in part to the use of a union bound across all iterations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
On the Generalization Error of Stochastic Mirror Descent for Quadratically-Bounded Losses: an Improved Analysis
In this work, we revisit the generalization error of stochastic mirror descent for quadratically bounded losses studied in Telgarsky (2022). Quadratically bounded losses is a broad class of loss functions, capturing both Lipschitz and smooth functions, for both regression and classification problems. We study the high probability generalization for this class of losses on linear predictors in both realizable and non-realizable cases when the data are sampled IID or from a Markov chain. The prior work relies on an intricate coupling argument between the iterates of the original problem and those projected onto a bounded domain. This approach enables blackbox application of concentration inequalities, but also leads to suboptimal guarantees due in part to the use of a union bound across all iterations.
Interpreting prediction markets: a stochastic approach
We strengthen recent connections between prediction markets and learning by showing that a natural class of market makers can be understood as performing stochastic mirror descent when trader demands are sequentially drawn from a fixed distribution. This provides new insights into how market prices (and price paths) may be interpreted as a summary of the market's belief distribution by relating them to the optimization problem being solved. In particular, we show that under certain conditions the stationary point of the stochastic process of prices generated by the market is equal to the market's Walrasian equilibrium of classic market analysis. Together, these results suggest how traditional market making mechanisms might be replaced with general purpose learning algorithms while still retaining guarantees about their behaviour.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)
Stochastic Mirror Descent in Variationally Coherent Optimization Problems
In this paper, we examine a class of non-convex stochastic optimization problems which we call variationally coherent, and which properly includes pseudo-/quasiconvex and star-convex optimization problems. To solve such problems, we focus on the widely used stochastic mirror descent (SMD) family of algorithms (which contains stochastic gradient descent as a special case), and we show that the last iterate of SMD converges to the problem's solution set with probability 1. This result contributes to the landscape of non-convex stochastic optimization by clarifying that neither pseudo-/quasi-convexity nor star-convexity is essential for (almost sure) global convergence; rather, variational coherence, a much weaker requirement, suffices. Characterization of convergence rates for the subclass of strongly variationally coherent optimization problems as well as simulation results are also presented.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Stochastic Mirror Descent in Average Ensemble Models
Kargin, Taylan, Salehi, Fariborz, Hassibi, Babak
The stochastic mirror descent (SMD) algorithm is a general class of training algorithms, which includes the celebrated stochastic gradient descent (SGD), as a special case. It utilizes a mirror potential to influence the implicit bias of the training algorithm. In this paper we explore the performance of the SMD iterates on mean-field ensemble models. Our results generalize earlier ones obtained for SGD on such models. The evolution of the distribution of parameters is mapped to a continuous time process in the space of probability distributions. Our main result gives a nonlinear partial differential equation to which the continuous time process converges in the asymptotic regime of large networks. The impact of the mirror potential appears through a multiplicative term that is equal to the inverse of its Hessian and which can be interpreted as defining a gradient flow over an appropriately defined Riemannian manifold. We provide numerical simulations which allow us to study and characterize the effect of the mirror potential on the performance of networks trained with SMD for some binary classification problems.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (3 more...)
Stochastic Mirror Descent for Large-Scale Sparse Recovery
Ilandarideva, Sasila, Bekri, Yannis, Juditsky, Anatoli, Perchet, Vianney
In this paper we discuss an application of Stochastic Approximation to statistical estimation of high-dimensional sparse parameters. The proposed solution reduces to resolving a penalized stochastic optimization problem on each stage of a multistage algorithm; each problem being solved to a prescribed accuracy by the non-Euclidean Composite Stochastic Mirror Descent (CSMD) algorithm. Assuming that the problem objective is smooth and quadratically minorated and stochastic perturbations are sub-Gaussian, our analysis prescribes the method parameters which ensure fast convergence of the estimation error (the radius of a confidence ball of a given norm around the approximate solution). This convergence is linear during the first "preliminary" phase of the routine and is sublinear during the second "asymptotic" phase. We consider an application of the proposed approach to sparse Generalized Linear Regression problem. In this setting, we show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution. We also present a numerical study illustrating the performance of the algorithm on high-dimensional simulation data.
Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent
Luo, Yiling, Huo, Xiaoming, Mei, Yajun
In statistics and machine learning, it is common to optimize an objective function that is a finitesum. SMD efficiently optimizes such an objective by using a subset of data to do one step update of the variable/parameter. Further adopting the variance reduction technique to SMD, we get the VRSMD algorithm that enjoys fast convergence [1], [2]. The implicit regularization is a relatively new concept [3] that explains why a result of an algorithm generalizes well in some overparameterized models [3], [4]. It refers to the fact that an algorithm can automatically select a minimum norm solution, which is not explicitly induced by the objective function. There are works on implicit regularization for Gradient Descent [5]- [8], Stochastic Gradient Descent [9]-[12], and Stochastic Mirror Descent [13]. Considering the computational advantage of VRSMD compared to all the algorithms above, it would be even better if VRSMD also has the useful implicit regularization property. From technical point of view, our work contains the following two results: In linear regression (including underfitting and overfitting), we show that the solution sequence of VRSMD converges to the minimum mirror interpolant, which is the implicit regularization property of VRSMD, and we also specify the convergence rate.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.46)